IEEE Access — Latest Matching Preprints

1

Vision Language Model for Coronary Angiogram Analysis and Report Generation: Development and Evaluation Study

Jiang, Q.; Ke, Y.; Sinisterra, L. G.; Elangovan, K.; Li, Z.; Yeo, K. K.; Jonathan, Y.; Ting, D. S. W.

2026-04-21 cardiovascular medicine 10.64898/2026.04.19.26351241 medRxiv

Top 0.2%

3.1%

Show abstract

Coronary artery disease is a leading cause of morbidity and mortality. Invasive coronary angiography is currently the gold standard in disease diagnosis. Several studies have attempted to use artificial intelligence (AI) to automate their interpretations with varying levels of success. However, most existing studies cannot generate detailed angiographic reports beyond simple classification or segmentation. This study aims to fine-tune and evaluate the performance of a Vision-Language Model (VLM) in coronary angiogram interpretation and report generation. Using twenty-thousand angiogram keyframes of 1987 patients collated across four unique datasets, we finetuned InternVL2-4B model with Low-Rank Adaptor weights that can perform stenosis detection, anatomy labelling, and report generation. The fine-tuned VLM achieved a precision of 0.56, recall of 0.64, and F1-score of 0.60 for stenosis detection. In anatomy segmentation, it attained a weighted precision of 0.50, recall of 0.43, and F1-score of 0.46, with higher scores in major vessel segments. Report generation integrating multiple angiographic projection views yielded an accuracy of 0.42, negative predictive value of 0.58 and specificity of 0.52. This study demonstrates the potential of using VLM to streamline angiogram interpretation to rapidly provide actionable information to guide management, support care in resource-limited settings, and audit the appropriateness of coronary interventions. AUTHOR SUMMARYCoronary artery disease has heavy disease burden worldwide and coronary angiogram is the gold standard imaging for its diagnosis. Interpreting these complex images and producing clinical reports require significant expertise and time. In this study, we fine-tuned and investigated an open-source VLM, InternVL2-4B, to interpret and report coronary angiogram images in key tasks including stenosis detection, anatomy identification, as well as full report generation. We also referenced the fine-tuned InternVL2-4B against state-of-the-art segmentation model, YOLOv8x, which was evaluated on the same test sets. We examined how machine learning metrics like the intersection over union score may not fully capture the clinical accuracy of model predictions and discussed the limitations of relying solely on these metrics for evaluating clinical AI systems. Although the model has not yet achieved expert-level interpretation, our results demonstrate the potential and feasibility of automating the reporting of coronary angiograms. Such systems could potentially assist cardiologists by improving reporting efficiency, highlightning lesions that may require review, and enabling automated calculations of clinical scores such as the SYNTAX score.

2

Analysis and Mitigation of Equipment-induced Shortcuts in AI Models for Laparoscopic Cholecystectomy

Protserov, S.; Repalo, A.; Mashouri, P.; Hunter, J.; Masino, C.; Madani, A.; Brudno, M.

2026-04-24 surgery 10.64898/2026.04.22.26351545 medRxiv

Top 0.2%

2.9%

Show abstract

Machine learning models have seen a lot of success in medical image segmentation domain. However, one of the challenges that they face are confounders or shortcuts: spurious correlations or biases in the training data that affect the resulting models. One example of such confounders for surgical machine learning is the setup of surgical equipment, including tools and lighting. Using the task of identification of safe and dangerous zones of dissection in laparoscopic cholecystectomy images and videos as a use-case, we inspect two equipment-induced biases: the presence of surgical tools in the field of view and the position of lighting. We propose methods for evaluating the severity of these biases and augmentation-based methods for mitigating them. We show that our tool bias mitigations improve the models' consistency under tool movements by 9 percentage points in the most inconsistent cases, and by 4 percentage points on average. Our lighting bias mitigations help reduce fraction of true dangerous zone pixels that may be predicted as safe under light changes from 5% to 1.5%, without compromising segmentation quality.

3

QRS Detection by Combinatorial Optimization With MLP Assisted Peak Scoring

Hopenfeld, B.

2026-04-22 bioengineering 10.64898/2026.04.19.719501 medRxiv

Top 0.3%

1.9%

Show abstract

A multiple channel QRS detector is described. The detector partitions raw signal segments into peak domains, extracts parameters associated with the peak domains, and scores peaks based on these parameters. A multi-layer perceptron (MLP) with 11 inputs generates provisional peak scores, which are refined through application of rules involving 20-30 parameters. An optimal sequence of supra threshold peaks is determined. Separately, combinatorial optimization determines an optimal structured heart rhythm sequence. Adjudication between the general supra threshold sequence and the structured sequence depends on noise level, peak quality, and rhythm structure quality. For multiple channel fusion, peak scores are determined as a noise weighted function of channel peak scores. The MLP was trained on approximately 70% of channel 1 of the MIT-BIH Arrhythmia Database. The supplementary rules were heuristically chosen over all channel 1 records. Sensitivity (SE) and positive predictive value (PPV) of the detector applied to channel 2 were a function of the noise threshold used to discard segments. At a noise level that would exclude 2.2% of channel 1 data, the SE and PPV were 99.67% and 99.75% respectively. Importantly, even in high noise, the detector was able to track large scale features of heart rhythm. Fused channel 1 and channel 2 SE and PPV were 99.96% and 99.98% respectively. The present algorithm points the way toward maximal extraction of heart rhythm information from noisy signals, and the potential to reduce false alarms generated by automated rhythm analysis software.

4

Multimodal Integration of Ambulatory ECG and Clinical Features for Sudden Cardiac Death and Pump Failure Death Prediction

Swee, S.; Adam, I.; Zheng, E. Y.; Ji, E.; Wang, D.; Speier, W.; Hsu, J.; Chang, K.-W.; Shivkumar, K.; Ping, P.

2026-04-22 cardiovascular medicine 10.64898/2026.04.21.26351421 medRxiv

Top 0.4%

1.7%

Show abstract

Ambulatory electrocardiograms (ECG) provides continuous monitoring of the hearts electrical activity. However, many existing machine learning and artificial intelligence models for analyzing ambulatory ECG traces are often unimodal and do not incorporate patient clinical context. In this study, we propose a multimodal framework integrating ambulatory ECG-derived representations with clinical text embeddings to predict two cardiac outcomes: sudden cardiac death and pump failure death. Ambulatory ECG traces are preprocessed, segmented, and encoded via a multiple instance learning and temporal convolutional neural network framework. In parallel, patient clinical features are parsed into structured prompts, which are passed through a large language model to generate clinical reasoning; this reasoning passes through a biomedical language encoder to generate a text embedding. With the ECG and text embeddings, we systematically evaluate multiple fusion strategies, including concatenation- and gating-based approaches, to integrate these two data modalities. Our results demonstrate that multimodal models consistently outperform unimodal baselines, with adaptive fusion mechanisms providing the greatest improvements in predictive performance. Decision curve analysis highlights the potential clinical utility of the proposed framework for risk stratification. Finally, we visualize model attention across modalities, including ECG attention patterns, segment-level saliency, heart rate variability features, and clinical reasoning, to contextualize patient-specific predictions.

5

Wearable Dual-Modality Plethysmography for Arterial Modulation and Blood Pressure Dip

Jung, S.; Thomson, S.

2026-04-21 physiology 10.64898/2026.04.17.719282 medRxiv

Top 0.5%

1.4%

Show abstract

Continuous, non-invasive cardiovascular monitoring is limited by the superficial sensing depth of Photoplethysmography (PPG), which is susceptible to peripheral artifacts. This study evaluates a wearable dual-modality prototype integrating dryelectrode Impedance Plethysmography (IPG) and PPG within a smartwatch form factor. Results from a pilot study (N=2) demonstrate that IPG signals exhibit a temporal lead over PPG across ventral and dorsal sites, supporting its greater penetration depth. During brachial artery modulation, IPG showed superior sensitivity to arterial recovery on the ventral forearm. Furthermore, 60-minute napping sessions revealed that while PPG remained morphologically stable, IPG signals underwent significant evolution, capturing distinct pulsewave archetypes. These findings suggest that wearable IPG provides a high-fidelity window into deep systemic hemodynamics typically reserved for clinical instrumentation.

6

How can AI be compatible with evidence-based medicine?: with an example of analysis of lung cancer recurrence

Usuzaki, T.; Matsunbo, E.; Inamori, R.

2026-04-25 radiology and imaging 10.64898/2026.04.17.26351114 medRxiv

Top 0.7%

0.9%

Show abstract

Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.

7

Feature-Based Parametric Response Mapping on Thoracic Computed Tomography for Robust Disease Classification in COPD

Namvar, A.; Shan, B.; Hoff, B.; Labaki, W. W.; Murray, S.; Bell, A. J.; Galban, S.; Kazerooni, E. A.; Martinez, F. J.; Hatt, C. R.; Han, M. K.; Galban, C. J.; Ram, S.

2026-04-27 radiology and imaging 10.64898/2026.04.24.26351675 medRxiv

Top 0.8%

0.8%

Show abstract

Purpose: To develop an interpretable feature-based Deep Parametric Response Mapping (PRMD) method that combines wavelet scattering convolution networks and machine learning to spatially detect and quantify functional small airways disease (fSAD) and emphysema on paired inspiratory-expiratory CT scans, with enhanced noise robustness. Materials and Methods: In this retrospective analysis of prospectively acquired data (2007-2017), we developed and validated a deep learning-based PRM approach using paired CT scans from 8,972 tobacco-exposed COPDGene participants ([≥]10 pack-years; mean age 60.1 {+/-} 8.8 years; 46.5% women), including controls with normal spirometry (n = 3,872; controls), PRISm (n = 1,089), GOLD 1-4 COPD (n = 4,011). Data were stratified into training, validation, and testing sets (24:6:70). PRMD extracts translation-invariant image features using a wavelet scattering network and applies a subspace learning classifier to classify voxels as emphysema or non-emphysematous air trapping (fSAD). PRMD was compared with conventional density-based PRM for voxel-wise agreement, correlation with pulmonary function, robustness to noise, and sensitivity to misregistration using Pearson correlation, Bland-Altman analysis, and paired t tests. Results: PRMD achieved 95% voxel-wise agreement with standard PRM (r = 0.98) while demonstrating significantly greater robustness under noise. PRMD showed stronger correlations with FEV1; (emphysema: r = - 0.54; fSAD: r = - 0.51; P < 0.0001) than standard PRM (r = - 0.42 for both; P < 0.0001). Under simulated high-noise conditions, standard PRM overestimated disease by ~15%, whereas PRMD limited error to < 5% (P < 0.001). Conclusion: PRMD provides an interpretable, feature-driven and noise-resilient alternative to traditional PRM for emphysema and fSAD classification, enhancing the reliability of CT-based COPD phenotyping for multi-center studies and low-dose imaging applications.

8

Assessing ageing, cognitive ability and freezing of gait in Parkinson's disease through integrated brain-heart network dynamics

Pitti, L.; Sitti, G.; Candia-Rivera, D.

2026-04-23 neurology 10.64898/2026.04.22.26351482 medRxiv

Top 0.9%

0.8%

Show abstract

Parkinson's Disease (PD) is a complex neurodegenerative disorder that manifests through systemic, large-scale physiological reorganizations. While research often focuses on region-specific neural changes, there is a growing need for multidomain approaches to capture the complexity of the disease and its clinical heterogeneity. This study proposes an analytical pipeline to evaluate Brain-Heart Interplay (BHI) as a novel systemic biomarker for neurodegeneration and healthy ageing. In this study we assessed BHI across three open-source datasets (EEG and ECG signals). We compared Healthy Young, Healthy Elderly, and PD patients in resting state to investigate the effects of ageing and cognitive performance. Additionally, we studied BHI trends in PD patients in the moment of freezing of gait (FOG). Methodologically, brain network organization was quantified using coherence-based EEG connectivity and graph theory, while heart activity was analyzed through Poincare plot-derived measures of cardiac autonomic activity. The coupling between these two systems was measured using the Maximal Information Coefficient to capture linear and non-linear dependencies between global cortical organization and cardiac autonomic outflow. The results demonstrate that BHI is a sensitive biomarker for detecting early multisystem dysfunction in both neurodegeneration and ageing. Furthermore, the identification of specific BHI trends during FOG onset suggests new opportunities for understanding the physiological mechanisms driving motor complications in PD. Our proposed pipeline provides a guiding tool for large-scale physiological assessment in clinical research.

9

Wavelet analysis reveals non-stationary cardiovascular rhythms associated with delirium and deep sedation in ICU patients

Sreekanth, J.; Salgado-Baez, E.; Edel, A.; Gruenewald, E.; Piper, S. K.; Spies, C.; Balzer, F.; Boie, S. D.

2026-04-23 health informatics 10.64898/2026.04.22.26351455 medRxiv

Top 1%

0.7%

Show abstract

Routine ICU data offers valuable insights into daily physiological rhythms. While traditional methods assume these cycles maintain fixed periods and amplitudes, their inherent variability requires dynamic estimation of instantaneous trends. Wavelet transform effectively resolves circadian oscillations, especially for frequently measured vital parameters. We present novel extensions to the Continuous Wavelet Transform (CWT) power spectral analysis to better detect and segment subtle temporal patterns. Using this approach, we uncover hidden circadian patterns in cardiovascular vitals such as Heart Rate (HR) and Mean Blood Pressure (MBP) measured over five days in a retrospective cohort of 855 ICU patients. By quantifying non-stationary rhythms, we identified diurnal and semi-diurnal oscillations varying in period and power according to delirium and deep sedation. Notably, HR exhibits a clear diurnal and semi-diurnal rhythm when delirium is absent. Overall, our framework supports the CWT as a powerful tool for analyzing complex physiological signals, particularly vital signs. Crucially, our findings suggest that cardiovascular rhythm disruption can be associated with ICU-related delirium and deep sedation.

10

Outcome Prediction Models for Critically Ill Patients Using Small Routine Laboratory Datasets

Cao, X.; Hou, J.; Wei, X.; Wang, Q.

2026-04-27 emergency medicine 10.64898/2026.04.26.26351758 medRxiv

Top 1%

0.5%

Show abstract

We present a suite of foundational, outcome prediction models for critically ill patients, developed using readily available, routine blood tests and advanced machine learning techniques. The input data of the models includes complete blood counts (CBCs), metabolic panels, and additional biomarkers that assess liver and kidney function, coagulation status, and cardiac injury. The output yields the predicted outcome at a given future horizon. For diagnoses, the length of the future horizon is set to zero, while it is set to a fixed time interval for prognoses. The training dataset in this study comprises clinical data from 332 ICU patients, augmented with 200 synthetic samples generated via a conditional diffusion model. Generative machine learning based data imputation and augmentation approaches yielded modest gains in predictive accuracy. However, substantial performance improvements were achieved through additional methods, including dimensionality and order reduction, SHAP based feature importance analysis, and a novel time series to image encoding strategy that enables the use of image based classifiers for temporal clinical data. Principal component analysis based order reduction produced measurable gains in outcome prediction, while the time series to image encoding proved particularly effective in mitigating small data limitations common in clinical research. Across all evaluation metrics, accuracy, precision, recall, F1 score, and AUROC, the prognostic models achieved performance exceeding 85\%, with some models attaining AUROC scores above 90%. We innovated a new model ensemble approach to optimize the predictive outcome. This ensemble modeling approach improves the overall prediction, pushing all assessment metrics over 90% . This work establishes a robust and interpretable AI enabled diagnostic and prognostic toolkit for outcome predictions in critically ill patients and demonstrates a scalable workflow for developing high performing models from sparse healthcare datasets. The proposed framework is readily deployable in ICU environments with routine blood testing capabilities and serves as a foundation for future integration into digital twin systems for critical care.

11

Estimation of motion direction and speed using an organic-semiconductor retinal prosthetic in a blind retinae

Krishnan, A.; Deepak, C. S.; Narayan, K. S.

2026-04-23 neuroscience 10.64898/2026.04.23.720306 medRxiv

Top 1%

0.5%

Show abstract

For a vision system, estimating the speed and direction of movement at the retinal input stage is an essential function for survival in many organisms. Retinal ganglion cells specific to this movement function were identified using multi-electrode array recordings in neonatal chick retina. Motion-evoked "visual streaks" and direction selective responses were observed in chick ganglion cells upon sequential activation as a response to moving bar stimuli. These characteristics were preserved in the sub-retinal prosthetic consisting of a semiconductor polymer film coupled to the blind chick retina which generated spatiotemporal activity patterns resembling those in natural vision. The motion parameters of direction and speed inferred from these recordings demonstrate that polymer-based prostheses can evoke physiologically relevant activity patterns, suggesting their potential to restore motion perception in degenerative retinae.

12

Kernel Matrix Completion with Topological and Spectral Features for Multi-Modal Classification

Rinon, E. M.; Visaya, M. V.; Sambayan, R.

2026-04-22 bioinformatics 10.64898/2026.04.19.713528 medRxiv

Top 1%

0.5%

Show abstract

Kernel methods offer a robust framework for integrating multi-modal datasets into a unified representation, thereby facilitating more comprehensive data interpretation. In the presence of incomplete datasets, multiple kernel learning is employed to enhance the efficiency of data completion and integration. We investigate kernel-based approaches to address the incomplete-data problem with applications to yeast protein data. Biological data such as yeast proteins can be represented through multiple modalities, including gene expression profiles, amino acid sequences, three-dimensional structures, and protein interaction networks. We introduce a computational pipeline based on kernel matrix completion, in which topological data analysis (TDA) and persistent spectral analysis are incorporated into the classification setting. TDA captures geometric structure across scales while spectral descriptors reflect connectivity patterns through Laplacian eigenvalues. Kernel, topological, and spectral descriptors are used with support vector machines to discriminate between membrane and non-membrane yeast proteins. Empirical results show that the combined pipeline improves both kernel completion accuracy and ROC performance relative to baseline kernel-only approaches. The best-performing configuration achieves an ROC score of 0.8632 using the average of three kernels augmented with TDA features. These results demonstrate competitive performance relative to strong kernel-based baselines under incomplete data conditions. The proposed approach provides a unified approach for learning from incomplete heterogeneous data while enriching kernel representations with geometric and spectral information.

13

Generic versus personalized foot-ground contact models for predictive simulations of walking: Is personalization worth the effort?

Williams, S. T.; Li, G.; Fregly, B. J.

2026-04-21 bioengineering 10.64898/2026.04.16.719049 medRxiv

Top 1%

0.5%

Show abstract

PurposeQuantification of walking function, including joint motions, ground reactions, and joint loads, outside the lab is a growing research area. Because only joint motions can currently be measured outside the lab, researchers are utilizing tracking optimizations of walking to estimate associated ground reactions and inverse dynamic joint loads. However, foot-ground contact models used in such optimizations have been generic rather than personalized, which may limit the accuracy of estimated ground reactions and joint loads. This study compares the predictive capabilities of generic versus personalized foot-ground contact models. MethodsGeneric and personalized foot-ground contact models were evaluated in calibration and tracking optimizations performed using experimental walking data collected from three subjects in varying states of health. Foot-only calibration optimizations evaluated how well both models could reproduce experimental ground reaction and foot motion data while tracking both types of data simultaneously, while whole-body tracking optimizations evaluated how well both models could reproduce experimental ground reactions, joint motion, and joint load data while tracking only experimental joint motion data and achieving dynamic consistency. ResultsFor all three subjects and both types of optimizations, personalized foot-ground contact models reproduced experimental ground reaction, joint motion, and joint load data more accurately than generic foot-ground contact models. ConclusionPersonalized foot-ground contact models can improve the accuracy with which ground reactions and joint loads can be estimated via tracking optimizations of walking using only experimental motion data as inputs. Personalized models require little time and effort to calibrate using freely available software tools and should improve the accuracy of predictive simulations of walking as well.

14

Shannon Entropy Trajectories Reveal Between-Arm Distributional Structure Invisible to Standard Endpoint Analysis in Pooled ALS Clinical Trials

Rodriguez, A. M.; The Pooled Resource Open-Access ALS Clinical Trials Consortium,

2026-04-22 neurology 10.64898/2026.04.20.26351319 medRxiv

Top 2%

0.5%

Show abstract

Standard analysis of amyotrophic lateral sclerosis (ALS) clinical trials evaluates therapeutic efficacy by comparing linear slopes of total ALS Functional Rating Scale (ALSFRS) scores between treatment arms. This approach compresses multidomain ordinal data into a single scalar trajectory, discarding distributional structure. When subgroup-level trends differ in timing or direction, such aggregation can attenuate or eliminate them, a phenomenon known as Simpsons paradox. Here we apply Shannon entropy, computed from item-level score distributions within each ALSFRS functional domain following the framework established in [8], to the PRO-ACT database, stratified by treatment arm (Active: n = 4,581; Placebo: n = 2,931; 19 monthly time points). The entropy trajectories of drug-treated and placebo populations diverge visibly and systematically across all four functional domains (Bulbar, Fine Motor, Gross Motor, Respiratory). In the Fine Motor domain, the placebo population reaches peak entropy at month 8 and reverses, while the active population does not peak until month 13, a five-month delay in the populations transit toward functional loss. This divergence is model-independent: it is present in the raw Shannon entropy trajectories before any dynamical model is applied. A permutation test shuffling patient-level arm labels (n = 1,000 permutations) confirms that the total integrated absolute divergence across all four domains exceeds the null distribution at p < 0.001 (observed: 4.48; null: 2.03 {+/-} 0.33; 7.5 standard deviations above the null mean), with Fine Motor (p = 0.001) and Respiratory (p < 0.001) individually significant. The quantity that differs between arms, the shape and timing of the populations distributional evolution, does not exist as a measurable quantity in the total-score linear-slope framework used to evaluate these trials. Whether this signal reflects genuine treatment effects, compositional artifacts from pooling heterogeneous trials, or both cannot be determined from the anonymized public database alone. What can be determined is that the standard ALS clinical trial endpoint makes an implicit assumption, that the distributional information it discards is uninformative, and the present results demonstrate empirically that this assumption is false.

15

When Data Meets Practice: A Qualitative Study of Clinician Perspectives on Streaming Data in Mental Health

Tian, J.; Kurkova, V.; Wu, Y.; Adu, M.; Hayward, J.; Greenshaw, A. J.; Cao, B.

2026-04-25 psychiatry and clinical psychology 10.64898/2026.04.23.26351640 medRxiv

Top 2%

0.4%

Show abstract

Patient-generated streaming data from wearable and digital technologies is increasingly promoted as a means of supporting mental health monitoring and clinical decision-making. While patient acceptance of these technologies has been reported, clinician perspectives remain underexplored despite their central role in determining whether streaming data are meaningfully integrated into routine care. This study explored clinicians experiences, as well as perceived facilitators and barriers, related to integrating patient-generated streaming data into routine mental health practice. A qualitative, exploratory interview study was conducted to examine clinicians experiences and perspectives on integrating patient-generated streaming data into mental health care. Semi-structured interviews were conducted with 33 clinicians, including family physicians (n=11), psychiatrists (n=12), and psychologists (n=10). Data were analyzed using reflexive thematic analysis guided by Braun and Clarkes six-step approach. Six themes were identified. Clinicians described variable use of digital and streaming technologies, ranging from routine engagement to deliberate non-use. Streaming data were viewed as clinically valuable when they provided longitudinal and objective insights, identified physiological and behavioural pattern changes, and supported patient engagement. However, clinicians emphasized that clinical usefulness was contingent on interpretability, contextual information, and relevance to decision-making. Major barriers included poor integration with electronic medical records, time constraints, data volume, limited organizational support, and uncertainty regarding data reliability and validity. Clinicians also expressed persistent concerns about privacy, governance, and regulatory oversight, highlighting the need for clear safeguards and accountability structures. Clinicians view patient-generated streaming data as a promising adjunct to mental health care, particularly for capturing longitudinal change between visits. However, meaningful clinical integration remains constrained by usability, workflow, organizational, and regulatory challenges, as well as limited confidence in data interpretation. Addressing these barriers through improved system integration, interpretive support, validation, and governance will be essential for translating the potential of streaming data into routine clinical practice.

16

Real-time prospective (shadow mode) validation of an AI-based clinical decision support system for predicting 3-month functional outcome in acute stroke: the VALIDATE study protocol

Rubiera, M.; Bendszus, M.; Leker, R. R.; Hilbert, A.; Werren, I.; Lopez-Ramos, L. M.; Ayesta, M.; Nguyen, T. N. Q.; Bonekamp, S.; Sala, V.; Jubran, H.; Meza, C.; Shalabi, F.; Schwartzmann, Y.; Cano, D.; von Tottleben, M.; Kelleher, J.; Frey, D.

2026-04-27 neurology 10.64898/2026.04.26.26350937 medRxiv

Top 2%

0.3%

Show abstract

Introduction Despite the proven benefits of reperfusion therapies in acute ischemic stroke, treatment decisions in the hyperacute phase remain complex and are rarely supported by individualized outcome predictions. Artificial intelligence (AI)-based clinical decision support systems (CDSS) offer potential real-time prognostic estimates, but prospective evidence of their feasibility and performance in routine clinical workflows is limited. Our aim is to prospectively evaluate real-time feasibility, usability, and predictive performance of an AI-based CDSS (VALIDATE-CDSS) for individualized outcome prediction in acute stroke care. Methods and analysis Prospective, multicenter, observational study enrolling consecutive patients with acute ischemic stroke presenting to three tertiary stroke centers. Clinical management will follow standard practice at the discretion of treating physicians. In parallel, a dedicated researcher will collect patient data in real time and input them into the VALIDATE-CDSS using a mobile application, operating in shadow mode without influencing clinical decisions. The system will generate individualized predictions of 3-month functional outcome (modified Rankin Scale) for four treatment strategies (intravenous thrombolysis, endovascular thrombectomy, combined therapy, or no reperfusion) at three sequential time points: baseline clinical data, non-contrast CT, and CT angiography. The primary outcome is the real-world feasibility and usability of the VALIDATE-CDSS in the hyperacute stroke workflow. Secondary outcomes include predictive performance, agreement between model-suggested and actual treatments, incremental value with increasing data availability, and assessment of potential bias across predefined subgroups. This study will provide prospective real-world evidence on the implementation and clinical potential of AI-based decision support for personalized treatment selection in acute ischemic stroke Ethics and dissemination Patient enrollment began after approval from the ethics committees of all participating centers. Results will be disseminated through peer-reviewed open-access journals and conference presentations. Following open science principles, anonymized data and metadata will be made publicly available in the Zenodo repository upon study completion. Trial registration: ClinicalTrials.gov (NCT05622539).

17

DIVAID: Consistent division of atrial geometries from multimodal imaging according to the EHRA/EACVI 15-segment bi-atrial model

Goetz, C.; Eichenlaub, M.; Schmidt, K.; Wiedmann, F.; Invers Rubio, E.; Martinez Diaz, P.; Luik, A.; Althoff, T.; Schmidt, C.; Loewe, A.

2026-04-23 cardiovascular medicine 10.64898/2026.04.22.26351448 medRxiv

Top 2%

0.3%

Show abstract

The recently published EHRA/EACVI consensus statement on a standardized bi-atrial regionalization provides new opportunities for consistent regional analyses across patients, imaging modalities and clinical centers. To make this standardized regionalization widely accessible, we developed the open-source software DIVAID, which automatically divides bi-atrial geometries according to the proposed regions, ensuring consistency, reproducibility and operator independence. We evaluated the accuracy of the algorithm by comparing its results to manual expert annotations across 140 geometries from multiple modalities and centers. Veins were automatically clipped correctly in 81% and orifices annotated correctly in 100% of cases. The median (interquartile range; IQR) Dice similarity coefficient (DSC) for left atrial regions was 0.98 (0.96-1.00) for DIVAID-expert and 0.98 (0.94-1.00) for inter-expert comparisons. For right atrial geometries, DSC was higher for DIVAID-expert than for inter-expert comparisons at 0.90 (0.80-0.95) and 0.88 (0.74-0.94), respectively. To assess the accuracy of regional boundaries, we computed the mean average surface distance (MASD) for boundaries derived from automatic or manual annotations. The median (IQR) MASD between DIVAID and experts was 0.17 mm (0.03-0.78) and 1.93 mm (0.65-3.96) in the left and right atrium, respectively. To conclude, DIVAID robustly divides anatomically diverse bi-atrial geometries according to the 15-segment model, while outperforming cardiac experts in both speed and consistency, and demonstrating an accuracy of regional boundaries comparable to the spatial resolution of cardiac imaging modalities. By providing automated, consistent atrial regionalization, DIVAID enables large-scale, standardized regional analyses and data-driven investigation of harmonized, multi-dimensional datasets, which may advance atrial arrhythmia research and personalized treatment strategies.

18

Identifying SARS-CoV-2 Lineages that Share the Same Relative Effective Reproduction Numbers

Musonda, R.; Ito, K.; Omori, R.; Ito, K.

2026-04-24 infectious diseases 10.64898/2026.04.22.26351531 medRxiv

Top 2%

0.3%

Show abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continuously evolved since its emergence in the human population in 2019. As of 1st August 2025, more than 1,700 Omicron subvariants have been designated by the Pango nomenclature system. The Pango nomenclature system designates a new lineage based on genetic and epidemiological information of SARS-CoV-2 strains. However, there is a possibility that strains that have similar genetic backgrounds and the same phenotype are given different Pango lineage names. In this paper, we propose a new algorithm, called FindPart-w, which can identify groups of viral lineages that share the same relative effective reproduction numbers. We introduced a new lineage replacement model, called the constrained RelRe model, which constrains groups of lineages to have the same relative effective reproduction numbers. The FindPart-w algorithm searches the equality constraints that minimise the Akaike Information Criterion of constrained RelRe models. Using hypothetical observation count data created by simulation, we found that the FindPart-w algorithm can identify groups of lineages having the same relative effective reproduction number in a practical computational time. Applying FindPart-w to actual real-world data of time-stamped lineage counts from the United States, we found that the Pango lineage nomenclature system may have given different lineage names to SARS-CoV-2 strains even if they have the same relative effective reproduction number and similar genetic backgrounds. In conclusion, this study showed that viruses that had the same relative effective reproduction number were identifiable from temporal count data of viral sequences. These findings will contribute to the future development of lineage designation systems that consider both genetic backgrounds and transmissibilities of lineages.

19

Robustly Quantifying Uncertainty in International Avian Influenza A(H5N1) Infection Fatality Ratios

Gada, L.; Afuleni, M. K.; Noble, M.; House, T.; Finnie, T.

2026-04-23 public and global health 10.64898/2026.04.22.26351373 medRxiv

Top 2%

0.3%

Show abstract

Knowing the mortality rates associated with infection by a pathogen is essential for effective preparedness and response. Here, harnessing the flexibility of a Bayesian approach, we produce an estimate of the Infection Fatality Ratio (IFR) for A(H5N1) conditional on explicit assumptions, and quantify the uncertainty thereof. We also apply the method to first-wave COVID-19 data up to March 2020, demonstrating the estimates that could be obtained were the model available then. Our analysis uses World Development Indicators (WDI) from the World Bank, the A(H5N1) WHO confirmed cases and deaths tracker by country (2003-2024), and COVID-19 cases and deaths data from John Hopkins University (January and February 2020). Since infectious disease dynamics are typically influenced by local socio-economic factors rather than political borders, individual countries are placed within clusters of countries sharing similar WDIs relevant to respiratory viral diseases, with clusters derived by performing Hierarchical Clustering. To estimate the IFR, we fit a Negative Binomial Bayesian Hierarchical Model for A(H5N1) and COVID-19 separately. We explicitly modelled key unobserved parameters with informative priors from expert opinion and literature. By modelling underreporting, our analysis suggests lower fatality (15.3%) compared to WHO's Case Fatality Ratio estimate (54%) on lab-confirmed cases. However, credible intervals are wide ([0.5%, 64.2%] 95% CrI). Therefore, good preparedness for a potential A(H5N1) pandemic implies adopting scenario planning under our central estimate, as well as for IFRs as high as 70%. Our approach also returns a COVID-19 IFR estimate of 2.8% with [2.5%, 3.1%] 95% CrI which is consistent with literature.

20

MedSAM2-CXR: A Box-Latent Framework for Chest X-ray Classification and Report Generation

Hakata, Y.; Oikawa, M.; Fujisawa, S.

2026-04-22 health informatics 10.64898/2026.04.20.26351338 medRxiv

Top 2%

0.3%

Show abstract

Who is affectedIn Japan, approximately 100 million chest radiographs (CXRs) are acquired annually, while only about 7,000 board-certified diagnostic radiologists practice nationwide (Japan Radiological Society workforce statistics; OECD Health Statistics, most recent available year). This implies an average workload exceeding 10,000 imaging studies per radiologist per year if all CXRs were attributed to board-certified diagnostic radiologists (an upper-bound estimate, because in practice many CXRs are primarily read by non-radiologist physicians). In settings such as night shifts, weekends, remote islands, and regional care networks, non-radiologist physicians frequently act as primary readers. Despite strong demand for AI assistance, existing systems are typically limited by one of three shortcomings -- poor cross-institutional generalization, limited interpretability, or inability to generate draft reports -- and consequently see limited clinical deployment. What we builtWe propose a Box-Latent Trinity that embeds each image as a hyperrectangle parameterized by a center c and a radius r, rather than as a single point in a latent space. We further introduce BL-TTA (Box-Latent Test-Time Augmentation), which approximately closes the train-inference gap (exact in the N [->] {infty} limit; N = 8 suffices in practice) by averaging predictions over samples drawn from within the latent box at inference time. Both components are implemented on top of the frozen MedSAM2 medical imaging foundation model. A single box representation simultaneously supports three functions: (A) theoretically grounded source selection, (B) device-invariant augmentation, and (C) case-based retrieval-augmented generation (RAG). Each prediction is accompanied by retrieved similar prior cases, a calibrated confidence estimate, and clinical-guideline references. How well it performsOn the Open-i CXR corpus (2,954 image-report pairs) under a patient-level 80/10/10 split and 5-seed reproducibility, the full system B5 achieves macro area under the receiver-operating-characteristic curve (macro-AUROC) 0.639 (best-seed test; 5-seed mean 0.626, Table 2; absolute +0.015 over the strongest same-backbone baseline, Merlin-style 0.624), elementwise accuracy 0.753 (absolute +0.072 over Merlin-style 0.681 -- equivalent to approximately 7 fewer label-level errors per 100 (label, image) predictions across 14 finding labels, not per 100 images), and report label-F1 0.435 (absolute +0.086, relative +25 % over the strongest same-backbone report-generation baseline, Bootstrapping-style 0.349). Under simulated pixel-space device-shift intensities up to twice the training distribution, AUROC degrades by only 0.014. Brier score (macro) is 0.061; Cohens{kappa} between two independent rule-based label extractors is 0.702 (substantial agreement); the box radius yields an out-of-distribution (OOD) detection AUROC of 0.595; and the framework provides four structural explainable-AI (XAI) outputs -- retrieved similar cases, confidence tier, per-axis uncertainty, and visual saliency -- which we jointly quantify in a single CXR study, a combination that, to our knowledge, has not been reported previously. O_TBL View this table: org.highwire.dtl.DTLVardef@d8ced6org.highwire.dtl.DTLVardef@1f3471dorg.highwire.dtl.DTLVardef@c1c2f1org.highwire.dtl.DTLVardef@e589bdorg.highwire.dtl.DTLVardef@1b5e410_HPS_FORMAT_FIGEXP M_TBL C_TBL Path to deploymentBecause the complete experiment can be reproduced in under two hours on a consumer-grade GPU (NVIDIA RTX 4060, 8 GB VRAM), the framework can run on compute resources already available at typical healthcare institutions. The approach thus supports the practical delivery of evidence-grounded diagnostic support to night shifts, remote-island care, and secondary readings in health checkups -- settings in which a board-certified radiologist is not locally available. One-sentence summaryReproducible end-to-end in under two hours on a single consumer-grade GPU, the proposed framework outperforms the strongest same-backbone medical-AI baselines on three principal metrics, maintains accuracy under simulated device shifts, and automatically drafts evidence-grounded radiology reports, offering a reproducible and compute-efficient direction toward reducing the reading burden of Japanese radiologists, subject to external validation.